Cross Modal Evaluation of High Quality Emotional Speech Synthesis with the Virtual Human Toolkit

نویسندگان

  • Blaise Potard
  • Matthew P. Aylett
  • David A. Baude
چکیده

Emotional expression is a key requirement for intelligent virtual agents. In order for an agent to produce dynamic spoken content speech synthesis is required. However, despite substantial work with prerecorded prompts, very little work has explored the combined effect of high quality emotional speech synthesis and facial expression. In this paper we offer a baseline evaluation of the naturalness and emotional range available by combining the freely available SmartBody component of the Virtual Human Toolkit (VHTK) with CereVoice text to speech (TTS) system. Results echo previous work using pre-recorded prompts, the visual modality is dominant and the modalities do not interact. This allows the speech synthesis to add gradual changes to the perceived emotion both in terms of valence and activation. The naturalness reported is good, 3.54 on a 5 point MOS scale.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Synthesising and Evaluating Cross-Modal Emotional Ambiguity in Virtual Agents

Emotional ambiguity, when more than one emotion appears present at a given time, or several emotions are superimposed, is common in human interaction and effects such as irony can be intentionally created through a mismatch of such emotional signals. High quality emotional speech synthesis offers a means for testing the effect of combining differences in vocal emotion, facial expression and tex...

متن کامل

Emotional speech synthesis for emotionally-rich virtual worlds

This paper aims to give a brief overview of the current state of the art in emotional speech synthesis in view of a multi-modal context. After a brief introduction into the concept of text-to-speech synthesis, two approaches to the expression of emotions in speech synthesis are described. The categorical approach models emotions as discrete categories and is able to provide high-quality emotion...

متن کامل

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

Command Speech Interface to Virtual Reality Applications

During last five years several attempts to develop the speech interface to especially simulation applications emerged due to the recent improvements in speech and language technology and the complexity of those application’s interfaces. We describe our approach to control Virtual Reality applications via voice and GUI, in creation of simple multimodal command speech interface based on dialog mo...

متن کامل

Anthropomorphic Agent as an Integrating Platform of Audio-Visual Information

One of ultimate human-machine interfaces is anthropomorphic spoken dialog agent which behaves like humans with facial animation and gesture and make speech conversations with humans. Among numerous efforts devoted for such a goal, Galatea Project conducted by 17 members from 12 universities is developing an open-source license-free software toolkit [1] for building an anthropomorphic spoken dia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016